\subsection{Differentials and first order changes}
Recall that for a function \(f(u_1, \dots, u_n)\), we define the differential of \(f\), written \(\dd{f}\), by
\[
	\dd{f} = \frac{\partial f}{\partial u_i} \dd{u}_i
\]
noting that the summation convention applies.
The \(\dd{u}_i\) are called differential forms, which can be thought of as linearly independent objects (if the coordinates \(u_1, \dots, u_n\) are independent), i.e.\ \(\alpha_i \dd{u}_i = 0 \implies \alpha_i = 0\) for all \(i\).
Similarly, if we have a vector \(\vb x(u_1, \dots, u_n)\), we define
\[
	\dd \vb x = \frac{\partial \vb x}{\partial u_i} \dd{u}_i
\]
As an example, let \(f(u, v, w) = u^2 + w \sin(v)\).
Then
\[
	\dd{f} = 2u \dd{u} + w \cos(v) \dd{v} + \sin(v) \dd{w}
\]
Similarly, given
\[
	\vb x(u, v, w) = \begin{pmatrix}
		u^2 - v^2 \\ w \\ e^v
	\end{pmatrix}
\]
we can compute
\[
	\dd \vb x = \begin{pmatrix}
		2u \\ 0 \\ 0
	\end{pmatrix} \dd{u} + \begin{pmatrix}
		-2v \\ 0 \\ e^v
	\end{pmatrix} \dd{v} + \begin{pmatrix}
		0 \\ 1 \\ 0
	\end{pmatrix} \dd{w}
\]
Differentials encode information about how a function (or vector field) changes when we change the coordinates by a small amount.
By calculus,
\[
	f(u + \delta u_1, \dots, u_n + \delta u_n) - f(u_1, \dots, u_n) = \frac{\partial f}{\partial u_i} \delta u_i + o(\delta \vb u)
\]
So if \(\delta f\) denotes the change in \(f(u_1, \dots, u_n)\) under this small change in coordinates, we have, to first order,
\[
	\delta f \approx \frac{\partial f}{\partial u_i}\delta u_i
\]
The analogous result holds for vector vields:
\[
	\delta \vb x \approx \frac{\partial \vb x}{\partial u_i}\delta u_i
\]

\subsection{Coordinates and line elements in \(\mathbb R^2\)}
We can create multiple different consistent coordinate systems by defining a relationship between them.
For example, polar coordinates \((r, \theta)\) and Cartesian coordinates \((x, y)\) can be related by
\[
	x = r \cos \theta;\quad y = r \sin \theta
\]
Even though this relationship is not bijective (there are multiple polar coordinates mapping to the origin), it's still a useful coordinate system because the vast majority of points work well.
Even coordinate systems with a countable amount of badly-behaved points are still useful.

A general set of coordinates \((u, v)\) on \(\mathbb R^2\) can be specified by their relationship to the standard Cartesian coordinates \((x, y)\).
We must specify smooth, invertible functions \(x(u, v)\), \(y(u, v)\).
We would also like to have a small change in one coordinate system to be equivalent to a small change in the other coordinate system (i.e.\ the inverse is also smooth).
The same principle applies in \(\mathbb R^3\) for three coordinates, for example.

Consider the standard Cartesian coordinates in \(\mathbb R^2\).
\[
	\vb x(x, y) = \begin{pmatrix}
		x \\ y
	\end{pmatrix} = x \vb e_x + y \vb e_y
\]
Note that \(\{\vb e_x, \vb e_y\}\) are orthonormal, and point in the same direction regardless of the value of \(\vb x\): \(\vb e_x\) points in the direction of changing \(x\) with \(y\) held constant, for example.
Equivalently,
\[
	\vb e_x = \frac{\frac{\partial}{\partial x} \vb x(x, y)}{\abs{\frac{\partial}{\partial x} \vb x(x, y)}};\quad \vb e_y = \frac{\frac{\partial}{\partial y} \vb x(x, y)}{\abs{\frac{\partial}{\partial y} \vb x(x, y)}}
\]
Note that
\[
	\dd \vb x = \frac{\partial \vb x}{\partial x}\dd{x} + \frac{\partial \vb x}{\partial y} \dd{y} = \dd{x} \vb e_x + \dd{y} \vb e_y
\]
In other words, when applying the change in coordinate \(x \mapsto x + \delta x\), the vector changes (to first order) to \(\vb x \mapsto \vb x + \delta x \vb e_x\).
In fact, in the case of Cartesian coordinates, this change is precisely correct for any size of \(\delta\), since the coordinate basis vectors are the same everywhere.
We call \(\dd \vb x\) the line element; it tells us how small changes in coordinates produce changes in position vectors.

Now, let us consider polar coordinates in two-dimensional space.
We can use the same idea as before, giving
\[
	\vb e_r = \frac{\frac{\partial}{\partial r} \vb x(r, \theta)}{\abs{\frac{\partial}{\partial r} \vb x(r, \theta)}} = \begin{pmatrix}
		\cos\theta \\ \sin\theta
	\end{pmatrix};\quad \vb e_\theta = \frac{\frac{\partial}{\partial \theta} \vb x(r, \theta)}{\abs{\frac{\partial}{\partial \theta} \vb x(r, \theta)}} = \begin{pmatrix}
		-\sin\theta \\ \cos\theta
	\end{pmatrix}
\]
Therefore, we have
\[
	\vb x(r, \theta) = \begin{pmatrix}
		r \cos\theta \\ r \sin\theta
	\end{pmatrix} = r\vb e_r
\]
Note that \(\{\vb e_r, \vb e_\theta\}\) are also orthonormal at each \((r, \theta)\), but their exact values are not the same everywhere.
Since the basis vectors are orthogonal, we can call \(r\) and \(\theta\) orthogonal curvilinear coordinates.
Also, we can compute the line element \(\dd \vb x\) as
\[
	\dd \vb x = \frac{\partial \vb x}{\partial r} \dd{r} + \frac{\partial \vb x}{\partial \theta} \dd{\theta} = \begin{pmatrix}
		\cos \theta \\ \sin \theta
	\end{pmatrix} \dd{r} + \begin{pmatrix}
		-r \sin \theta \\ r \cos \theta
	\end{pmatrix} \dd{\theta} = \dd{r} \, \vb e_r + r\dd{\theta} \, \vb e_\theta
\]
We see that a change in \(\theta\) produces (up to first order) a change \(\vb x \mapsto \vb x + r \,\delta \theta \,\vb e_\theta\), a change proportional to \(r\).
So a small change in \(\theta\) could cause quite a large change in Cartesian coordinates.

\subsection{Orthogonal curvilinear coordinates}
We say that \((u, v, w)\) are a set of orthogonal curvilinear coordinates if the vectors
\[
	\vb e_u = \frac{\frac{\partial \vb x}{\partial u}}{\abs{\frac{\partial \vb x}{\partial u}}};\quad \vb e_v = \frac{\frac{\partial \vb x}{\partial v}}{\abs{\frac{\partial \vb x}{\partial v}}};\quad \vb e_w = \frac{\frac{\partial \vb x}{\partial w}}{\abs{\frac{\partial \vb x}{\partial w}}}
\]
form a right-handed, orthonormal basis for each \((u, v, w)\); but not necessarily the same basis over the entire vector field.
It is standard to write
\[
	h_u = \abs{\frac{\partial \vb x}{\partial u}};\quad h_v = \abs{\frac{\partial \vb x}{\partial v}};\quad h_w = \abs{\frac{\partial \vb x}{\partial w}}
\]
We call \(h_u, h_v, h_w\) the scale factors.
Note that the line element is
\begin{align*}
	\dd \vb x & = \frac{\partial \vb x}{\partial u}\dd{u} + \frac{\partial \vb x}{\partial v}\dd{v} + \frac{\partial \vb x}{\partial w} \dd{w} \\
	          & = h_u \vb e_u \dd{u} + h_v \vb e_v \dd{v} + h_w \vb e_w \dd{w}
\end{align*}
So the scale factors show how first-order changes in the coordinates are scaled into changes in \(\vb x\).

\subsection{Cylindrical polar coordinates}
We define \((\rho, \phi, z)\) by
\[
	\vb x(\rho, \phi, z) = \begin{pmatrix}
		\rho \cos \phi \\
		\rho \sin \phi \\
		z
	\end{pmatrix}
\]
where \(0 \leq \rho; 0 \leq \phi < 2 \pi; z \in \mathbb R\).
So we can find
\[
	\vb e_\rho = \begin{pmatrix}
		\cos \phi \\ \sin \phi \\ 0
	\end{pmatrix};\quad \vb e_\phi = \begin{pmatrix}
		-\sin \phi \\ \cos \phi \\ 0
	\end{pmatrix};\quad \vb e_z = \begin{pmatrix}
		0 \\ 0 \\ 1
	\end{pmatrix}
\]
The scale factors are
\[
	h_\rho = 1;\quad h_\phi = \rho;\quad h_z = 1
\]
The line element is
\[
	\dd \vb x = \dd{\rho} \, \vb e_\rho + \rho \dd{\phi} \, \vb e_\phi + \dd{z} \, \vb e_z
\]
Note that
\[
	\vb x = \rho \begin{pmatrix}
		\cos \phi \\ \sin \phi \\ 0
	\end{pmatrix} + z \begin{pmatrix}
		0 \\ 0 \\ 1
	\end{pmatrix} = \rho \vb e_\rho + z \vb e_z
\]

\subsection{Spherical polar coordinates}
We define \((r, \theta, \phi)\) by
\[
	\vb x(r, \theta, \phi) = \begin{pmatrix}
		r \cos \phi \sin \theta \\
		r \sin \phi \sin \theta \\
		r \cos \theta
	\end{pmatrix}
\]
where \(0 \leq r; 0 \leq \theta < \pi; 0 \leq \phi < 2 \pi\).
So we can find
\[
	\vb e_r = \begin{pmatrix}
		\cos \phi \sin \theta \\ \sin \phi \sin \theta \\ \cos \theta
	\end{pmatrix};\quad \vb e_\theta = \begin{pmatrix}
		\cos \phi \cos \theta \\ \sin \phi \cos \theta \\ -\sin \theta
	\end{pmatrix};\quad \vb e_\phi = \begin{pmatrix}
		-\sin \phi \\ \cos \phi \\ 0
	\end{pmatrix}
\]
The scale factors are
\[
	h_r = 1;\quad h_\theta = r;\quad h_\phi = r \sin \theta
\]
The line element is
\[
	\dd \vb x = \dd{r} \, \vb e_r + r \dd{\theta} \, \vb e_\theta + r \sin \theta \dd{\phi} \, \vb e_\phi
\]
Note that
\[
	\vb x = r \begin{pmatrix}
		\cos \phi \sin \theta \\ \sin \phi \sin \theta \\ \cos \theta
	\end{pmatrix} = r \vb e_r
\]

\subsection{Gradient operator}
For \(f \colon \mathbb R^3 \to \mathbb R\), we define the gradient of \(f\), written \(\grad f\), by
\begin{equation}
	f(\vb x + \vb h) = f(\vb x) + \grad f(\vb x) \cdot \vb h + o(\vb h)
	\tag{\(\ast\)}
\end{equation}
as \(\abs{\vb h} \to 0\).
The directional derivative of \(f\) in the direction \(\vb v\), denoted by \(D_{\vb v} f\) or \(\frac{\partial f}{\partial \vb v}\), is defined by
\[
	D_{\vb v} f(\vb x) = \lim_{t \to 0} \frac{f(\vb x + t\vb v) - f(\vb x)}{t}
\]
Alternatively,
\begin{equation}
	f(\vb x + t\vb v) = f(\vb x) + t D_{\vb v}f(\vb x) + o(t)
	\tag{\(\dagger\)}
\end{equation}
as \(t \to 0\).
Setting \(\vb h = t\vb v\) in \((\ast)\), we have
\[
	f(\vb x + t\vb v) = f(\vb x) + t \grad f(\vb x) \cdot \vb v + o(t)
\]
This gives another way to interpret the gradient of \(f\).
Comparing this result to \((\dagger)\), we see that
\[
	D_{\vb v} f = \vb v \cdot \grad f
\]
By the Cauchy--Schwarz inequality, the dot product is maximised when the two vectors are parallel.
Hence, the directional derivative is maximised when \(\vb v\) points in the direction of \(\grad f\).
So \(\grad f\) points in the direction of greatest increase of \(f\).
Similarly, \(-\grad f\) points in the direction of greatest decrease of \(f\).
For example, suppose \(f(x) = \frac{1}{2}\abs{\vb x}^2\).
Then
\[
	f(\vb x + \vb h) = \frac{1}{2}(\vb x + \vb h)\cdot (\vb x + \vb h) = \frac{1}{2}\abs{\vb x}^2 + \frac{1}{2}(2\vb x \cdot \vb h) + \frac{1}{2}\abs{\vb h}^2 = f(\vb x) + \vb x \cdot \vb h + o(\vb h)
\]
Hence \(\grad f(\vb x) = \vb x\).

\subsection{Gradient on curves}
Suppose we have a curve \(t \mapsto \vb x(t)\).
How does some function \(f\) change when moving along the curve?
We will write \(F(t) = f(\vb x(t)), \delta \vb x = \vb x(t + \delta t) - \vb x(t)\).
\begin{align*}
	F(t + \delta t) & = f(\vb x(t + \delta t))                                               \\
	                & = f(\vb x(t) + \delta \vb x)                                           \\
	                & = f(\vb x(t)) + \grad f(\vb x(t)) \cdot \delta \vb x + o(\delta \vb x) \\
	\intertext{Since \(\delta \vb x = \vb x'(t) \,\delta t + o(\delta t)\), we have}
	F(t + \delta t) & = F(t) + \vb x'(t) \cdot \grad f(\vb x(t)) \,\delta t + o(\delta t)
\end{align*}
In other words,
\[
	\frac{\dd{F}}{\dd{t}} = \frac{\dd}{\dd{t}}f(\vb x(t)) = \frac{\dd \vb x}{\dd{t}} \cdot \grad f(\vb x(t))
\]

\subsection{Gradient on surfaces}
Suppose we have a surface \(S\) in \(\mathbb R^3\) defined implicitly by
\[
	S = \{ \vb x \in \mathbb R^3 : f(\vb x) = 0 \}
\]
If \(t \mapsto \vb x(t)\) is any curve in \(S\), then \(f(\vb x(t)) = 0\) everywhere.
So
\[
	0 = \frac{\dd}{\dd{t}}f(\vb x(t)) = \grad f(\vb x(t)) \cdot \frac{\dd \vb x}{\dd{t}}
\]
So \(\grad f(\vb x(t))\), the gradient, is orthogonal to \(\frac{\dd \vb x}{\dd{t}}\), the tangent vector of any chosen curve in \(S\).
So \(\grad f(\vb x(t))\) is normal to the surface.

\subsection{Coordinate-independent representation}
If we are working in an orthogonal curvilinear coordinate system \((u, v, w)\), it is not immediately clear how to compute \(\grad f\), since we need to represent this arbitrary perturbation \(\vb h\) using \((u, v, w)\).
In Cartesian coordinates it is simple; to represent the change \(\vb x \mapsto \vb x + \vb h\) we simply add the components of \(\vb x\) and \(\vb h\).
\begin{align*}
	f(\vb x + \vb h) & = f((x + h_1, y + h_2, z + h_3))                                                                                                  \\
	                 & = f(\vb x) + \frac{\partial f}{\partial x} h_1 + \frac{\partial f}{\partial y} h_2 + \frac{\partial f}{\partial z} h_3 + o(\vb h) \\
	                 & = f(\vb x) + \begin{pmatrix}
		\partial f / \partial x \\ \partial f / \partial y \\ \partial f / \partial z
	\end{pmatrix} \cdot h + o(\vb h)                                                                        \\
\end{align*}
So we have
\[
	\implies \grad f = \begin{pmatrix}
		\partial f / \partial x \\ \partial f / \partial y \\ \partial f / \partial z
	\end{pmatrix}
\]
Or, using suffix notation,
\[
	\grad f = \vb e_i \frac{\partial f}{\partial x_i};\quad [\grad f]_i = \frac{\partial f}{\partial x_i}
\]
We see that this \(\grad\) is a kind of vector differential operator.
In Cartesian coordinates,
\[
	\grad = \vb e_x \frac{\partial}{\partial x} + \vb e_y \frac{\partial}{\partial y} + \vb e_z \frac{\partial}{\partial z} \equiv \vb e_i \frac{\partial}{\partial x_i}
\]
From our previous example,
\[
	f(\vb x) = \frac{1}{2}(x^2 + y^2 + z^2) = \frac{1}{2}\abs{\vb x}^2
\]
\begin{align*}
	[\grad f]_i & = \frac{\partial}{\partial x_i}\left[ \frac{1}{2} x_j x_j \right] \\
	            & = \frac{1}{2} \left[ \delta_{ij} x_j + x_j \delta_{ij} \right]    \\
	            & = x_i                                                             \\
	\grad f     & = \vb e_i x_i
\end{align*}
Let us return back to computing the gradient in the general case.
Recall that in Cartesian coordinates, the line element is simple:
\[
	\dd \vb x = \dd{x}_i \vb e_i
\]
And also, if we have a function on \(\mathbb R^3\) such as \(f(x, y, z)\), it has the differential
\[
	\dd{f} = \frac{\partial f}{\partial x_i}\dd{x}_i
\]
Then,
\begin{align*}
	\grad f \cdot \dd \vb x & = \left( \vb e_i \frac{\partial f}{\partial x_i} \right) \cdot \left( \vb e_j \dd{x}_j \right) \\
	                        & = \frac{\partial f}{\partial x_i} \left( \vb e_i \cdot \vb e_j \right) \dd{x}_j                \\
	                        & = \frac{\partial f}{\partial x_i} \delta_{ij} \dd{x}_j                                         \\
	                        & = \frac{\partial f}{\partial x_i} \dd{x}_i                                                     \\
	                        & = \dd{f}
\end{align*}
In other words, in \textit{any} set of coordinates,
\[
	\grad f \cdot \dd \vb x = \dd{f}
\]

\subsection{Computing the gradient vector}
\begin{proposition}
	If \((u, v, w)\) are orthogonal curvilinear coordinates, and \(f\) is a function of the position vector \((u, v, w)\), then
	\[
		\grad f = \frac{1}{h_u}\frac{\partial f}{\partial u}\vb e_u + \frac{1}{h_v}\frac{\partial f}{\partial v}\vb e_v + \frac{1}{h_w}\frac{\partial f}{\partial w}\vb e_w
	\]
\end{proposition}
\begin{proof}
	If \(f = f(u, v, w)\) and \(\vb x = \vb x(u, v, w)\), then
	\[
		\dd{f} = \frac{\partial f}{\partial u}\dd{u} + \frac{\partial f}{\partial v}\dd{v} + \frac{\partial f}{\partial w}\dd{w}
	\]
	\[
		\dd{x} = h_u \dd{u} \vb e_u + h_v \dd{v} \vb e_v + h_w \dd{w} \vb e_w
	\]
	Using the above result, we have
	\[
		\grad f \cdot \dd \vb x = \dd{f}
	\]
	\[
		\left( (\grad f)_u \vb e_u + (\grad f)_v \vb e_v + (\grad f)_w \vb e_w \right) \cdot \left( h_u \dd{u} \vb e_u + h_v \dd{v} \vb e_v + h_w \dd{w} \vb e_w \right) = \frac{\partial f}{\partial u}\dd{u} + \frac{\partial f}{\partial v}\dd{v} + \frac{\partial f}{\partial w}\dd{w}
	\]
	\[
		(\grad f)_u h_u \dd{u} + (\grad f)_v h_v \dd{v} + (\grad f)_w h_w \dd{w} = \frac{\partial f}{\partial u}\dd{u} + \frac{\partial f}{\partial v}\dd{v} + \frac{\partial f}{\partial w}\dd{w}
	\]
	Since \(u, v, w\) are independent coordinates, \(\dd{u}, \dd{v}, \dd{w}\) are linearly independent.
	So we can simply compare coefficients, getting
	\[
		\grad f = \frac{1}{h_u}\frac{\partial f}{\partial u}\vb e_u + \frac{1}{h_v}\frac{\partial f}{\partial v}\vb e_v + \frac{1}{h_w}\frac{\partial f}{\partial w}\vb e_w
	\]
	as required.
\end{proof}
In cylindrical polar coordinates, we have
\[
	\grad f = \frac{\partial f}{\partial \rho} \vb e_\rho + \frac{1}{\rho} \frac{\partial f}{\partial \phi} \vb e_\phi + \frac{\partial f}{\partial z} \vb e_z
\]
In spherical polar coordinates, we have
\[
	\grad f = \frac{\partial f}{\partial r} \vb e_r + \frac{1}{r} \frac{\partial f}{\partial \theta} \vb e_\theta + \frac{1}{r\sin\theta} \frac{\partial f}{\partial \phi} \vb e_\phi
\]
Then using the familiar example \(f(\vb x) = \frac{1}{2}\abs{\vb x}^2\), we have
\[
	f = \begin{cases}
		\frac{1}{2}(x^2 + y^2 + z^2) & \text{in Cartesian coordinates}         \\
		\frac{1}{2}(\rho^2 + z^2)    & \text{in cylindrical polar coordinates} \\
		\frac{1}{2}r^2               & \text{in spherical polar coordinates}   \\
	\end{cases}
\]
Then we can check the value of \(\grad f\) in these different coordinate systems.
\begin{align*}
	\grad f & = \begin{cases}
		x \vb e_x + y \vb e_y + z \vb e_z & \text{in Cartesian coordinates}         \\
		\rho \vb e_\rho + z \vb e_z       & \text{in cylindrical polar coordinates} \\
		r \vb e_r                         & \text{in spherical polar coordinates}   \\
	\end{cases} \\
	        & = \vb x
\end{align*}
